Data Integration in Data Warehousing (Keynote Address)

نویسنده

  • Diego Calvanese
چکیده

Data integration is a central problem in the design of Data Wareshouses and Decision Support Systems. When data passes from the sources of the application-oriented operational environment to the Data Warehouse, possible inconsistencies and redundancies should be resolved, so that the warehouse is able to provide an integrated and reconciled view of data of the organization. Generally speaking, a data integration system combines the data residing at different sources, and provides a unified, reconciled view of these data, called global schema, which can be queried by the user. In the design of a data integration system, an important aspect is the way in which the global schema is specified, i.e., which data model is adopted and what kind of constraints on the data can be expressed. Moreover, a basic decision is related to the problem of how to specify the relation between the sources and the global schema. There are basically two approaches for this problem. The first approach, called global-as view (GAV), requires that the global schema is expressed in terms of the data sources. More precisely, to every concept of the global schema, a view over the data sources is associated, so that its meaning is specified in terms of the data residing at the sources. In the second approach, called local-as-view (LAV), the global schema is specified independently from the sources, and the relationships between the global schema and the sources are established by defining every source as a view over the global schema. The ultimate goal of a data integration system is to answer queries posed by the user in terms of the global schema. Obviously, query processing depends on the form of the data integration system and, specifically, on whether the GAV or LAV approach is adopted and on the form of constraints allowed on the global schema. In the invited talk we illustrate basic techniques for computing the correct answers to a data integration system in various practically significant cases. We then consider the conditions that are typical of Data Warehouse applications, which restrict the large spectrum of approaches that have been proposed for integration. We discuss a data integration architecture specifically developed for this context within the IST European Project``Foundations of Data Warehouse Quality'' (DWQ). The DWQ integration architecture follows the LAV approach and defines both Data Warehouse tables and source tables in terms of a global schema.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DSE '03 Preface

Decision systems play an ever increasing role in today's enterprises. The design of these specific information systems requires much expertise. The maturity of methodologies for guiding the designers is not sufficient, leading to systems requiring costly maintenance. Therefore, the purpose of this first International Workshop on Decision Systems Engineering (DSE'03) is to enable participants to...

متن کامل

Integration and dimensional modeling approaches for complex data warehousing

With the broad development of the World Wide Web, various kinds of heterogeneous data (including multimedia data) are now available to decision support tasks. A data warehousing approach is often adopted to prepare data for relevant analysis. Data integration and dimensional modeling indeed allow the creation of appropriate analysis contexts. However, the existing data warehousing tools are wel...

متن کامل

Warehouse Creation - A Potential Roadblock to Data Warehousing

Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated analyses of their data. Recent years have seen the introduction of a number of data-warehousing engines, from both established database vendors as well as new players. The engines themselves are relatively easy to use and come with a good set of end-user tools. However, there i...

متن کامل

Warehousing complex data from the web

The data warehousing and OLAP technologies are now moving onto handling complex data that mostly originate from the Web. However, intagrating such data into a decision-support process requires their representation under a form processable by OLAP and/or data mining techniques. We present in this paper a complex data warehousing methodology that exploits XML as a pivot language. Our approach inc...

متن کامل

Quality-Aware Integration and Warehousing of Genomic Data

In human health and life sciences, researchers extensively collaborate with each other, sharing biomedical and genomic data and their experimental results. This necessitates dynamically integrating different databases or warehousing them into a single repository. Based on our past experience of building a data warehouse called GEDAW (Gene Expression Data Warehouse) that stores data on genes exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003